79 research outputs found

    Automatic Induction of Classification Rules from Examples Using N-Prism

    Full text link
    www.dis.port.ac.uk/~bramerma One of the key technologies of data mining is the automatic induction of rules from examples, particularly the induction of classification rules. Most work in this field has concentrated on the generation of such rules in the intermediate form of decision trees. An alternative approach is to generate modular classification rules directly from the examples. This paper seeks to establish a revised form of the rule generation algorithm Prism as a credible candidate for use in the automatic induction of classification rules from examples in practical domains where noise may be present and where predicting the classification for previously unseen instances is the primary focus of attention

    A scalable expressive ensemble learning using Random Prism: a MapReduce approach

    Get PDF
    The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

    The global epidemiology of hepatitis E virus infection: A systematic review and meta-analysis

    Get PDF
    Background and aims: Hepatitis E virus (HEV), as an emerging zoonotic pathogen, is a leading cause of acute viral hepatitis worldwide, with a high risk of developing chronic infection in immunocompromised patients. However, the global epidemiology of HEV infection has not been comprehensively assessed. This study aims to map the global prevalence and identify the risk factors of HEV infection by performing a systematic review and meta-analysis. Methods: A systematic searching of articles published in Medline, Embase, Web of science, Cochrane and Google scholar databases till July 2019 was conducted to identify studies with HEV prevalence data. Pooled prevalence among different countries and continents was estimated. HEV IgG seroprevalence of subgroups was compared and risk factors for HEV infection were evaluated using odd ratios (OR). Results: We identified 419 related studies which comprised of 1 519 872 individuals. A total of 1 099 717 participants pooled from 287 studies of general population estimated a global anti-HEV IgG seroprevalence of 12.47% (95% CI 10.42-14.67; I2 = 100%). Notably, the use of ELISA kits from different manufacturers has a substantial impact on the global estimation of anti-HEV IgG seroprevalence. The pooled estimate of anti-HEV IgM seroprevalence based on 98 studies is 1.47% (95% CI 1.14-1.85; I2 = 99%). The overall estimate of HEV viral RNA-positive rate in general population is 0.20% (95% CI 0.15-0.25; I2 = 98%). Consumption of raw meat (P =.0001), exposure to soil (P <.0001), blood transfusion (P =.0138), travelling to endemic areas (P =.0244), contacting with dogs (P =.0416), living in rural areas (P =.0349) and receiving education less than elementary school (P <.0001) were identified as risk factors for anti-HEV IgG positivity. Conclusions: Globally, approximately 939 million corresponding to 1 in 8 individuals have ever experienced HEV infection. 15-110 million individuals have recent or ongoing HEV infection. Our study highlights the substantial burden of HEV infection and calls for increasing routine screening and preventive measures

    Direct-acting antiviral agents for liver transplant recipients with recurrent genotype 1 hepatitis C virus infection: Systematic review and meta-analysis

    Get PDF
    Background: Comprehensive evaluation of safety and efficacy of different combina‐ tions of direct‐acting antivirals (DAAs) in liver transplant recipients with genotype 1 (GT1) hepatitis C virus (HCV) recurrence remains limited. Therefore, we performed this systematic review and meta‐analysis in order to evaluate the clinical outcome of DAA treatment in liver transplant patients with HCV GT1 recurrence. Methods: Studies were included if they contained information of 12 weeks sustained virologic response (SVR12) after DAA treatment completion as well as treatment re‐ lated complications for liver transplant recipients with GT1 HCV recurrence. Results: We identified 16 studies comprising 885 patients. The overall pooled esti‐ mate proportion of SVR12 was 93% (95% confidence interval (CI): 0.89, 0.96), with moderate heterogeneity observed (τ 2 = 0.01, P < 0.01, I 2 =75%). High tolerability was observed in liver transplant recipients reflected by serious adverse events (sAEs) with pooled estimate proportion of 4% (95% CI: 0.01, 0.07; τ2 = 0.02, P < 0.01, I 2 = 81%). For subgroup analysis, a total of five different DAA regimens were applied for treating these patients. Sofosbuvir/Ledipasvir (SOF/LDV) led the highest pooled estimate SVR12 proportion, followed by Paritaprevir/Ritonavir/Ombitasivir/Dasabuvir (PrOD), Daclatasvir (DCV)/Simeprevir (SMV) ± Ribavirin (RBV), and SOF/SMV ± RBV, Asunaprevir (ASV)/DCV. There was a tendency for favoring a higher pooled SVR12 proportion in patients with METAVIR Stage F0‐F2 of 97% (95% CI: 0.93, 0.99) com‐ pared to 85% (95% CI: 0.79, 0.90) for stage F3‐F4 (P < 0.01). There was no significant difference between LT recipients treated with or without RBV (P = 0.23). Conclusions: Direct‐acting antiviral treatment is highly effective and well‐tolerated in liver transplant recipients with recurrent GT1 HCV infection

    Tobacco use induces anti-apoptotic, proliferative patterns of gene expression in circulating leukocytes of Caucasian males

    Get PDF
    Abstract Background Strong epidemiologic evidence correlates tobacco use with a variety of serious adverse health effects, but the biological mechanisms that produce these effects remain elusive. Results We analyzed gene transcription data to identify expression spectra related to tobacco use in circulating leukocytes of 67 Caucasian male subjects. Levels of cotinine, a nicotine metabolite, were used as a surrogate marker for tobacco exposure. Significance Analysis of Microarray and Gene Set Analysis identified 109 genes in 16 gene sets whose transcription levels were differentially regulated by nicotine exposure. We subsequently analyzed this gene set by hyperclustering, a technique that allows the data to be clustered by both expression ratio and gene annotation (e.g. Gene Ontologies). Conclusion Our results demonstrate that tobacco use affects transcription of groups of genes that are involved in proliferation and apoptosis in circulating leukocytes. These transcriptional effects include a repertoire of transcriptional changes likely to increase the incidence of neoplasia through an altered expression of genes associated with transcription and signaling, interferon responses and repression of apoptotic pathways

    The global impact of non-communicable diseases on macro-economic productivity: a systematic review

    Get PDF
    © 2015, The Author(s). Non-communicable diseases (NCDs) have large economic impact at multiple levels. To systematically review the literature investigating the economic impact of NCDs [including coronary heart disease (CHD), stroke, type 2 diabetes mellitus (DM), cancer (lung, colon, cervical and breast), chronic obstructive pulmonary disease (COPD) and chronic kidney disease (CKD)] on macro-economic productivity. Systematic search, up to November 6th 2014, of medical databases (Medline, Embase and Google Scholar) without language restrictions. To identify additional publications, we searched the reference lists of retrieved studies and contacted authors in the field. Randomized controlled trials, cohort, case–control, cross-sectional, ecological studies and modelling studies carried out in adults (>18 years old) were included. Two independent reviewers performed all abstract and full text selection. Disagreements were resolved through consensus or consulting a third reviewer. Two independent reviewers extracted data using a predesigned data collection form. Main outcome measure was the impact of the selected NCDs on productivity, measured in DALYs, productivity costs, and labor market participation, including unemployment, return to work and sick leave. From 4542 references, 126 studies met the inclusion criteria, many of which focused on the impact of more than one NCD on productivity. Breast cancer was the most common (n = 45), followed by stroke (n = 31), COPD (n = 24), colon cancer (n = 24), DM (n = 22), lung cancer (n = 16), CVD (n = 15), cervical cancer (n = 7) and CKD (n = 2). Four studies were from the WHO African Region, 52 from the European Region, 53 from the Region of the Americas and 16 from the Western Pacific Region, one from the Eastern Mediterranean Region and none from South East Asia. We found large regional differences in DALYs attributable to NCDs but especially for cervical and lung cancer. Productivity losses in the USA ranged from 88 million US dollars (USD) for COPD to 20.9 billion USD for colon cancer. CHD costs the Australian economy 13.2 billion USD per year. People with DM, COPD and survivors of breast and especially lung cancer are at a higher risk of reduced labor market participation. Overall NCDs generate a large impact on macro-economic productivity in most WHO regions irrespective of continent and income. The absolute global impact in terms of dollars and DALYs remains an elusive challenge due to the wide heterogeneity in the included studies as well as limited information from low- and middle-income countries.WHO; NestleÂŽ Nutrition (Nestec Ltd.); Metagenics Inc.; and AX
    • 

    corecore